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AMENDMENTS TO THE CLAIMS 

This listing of claims will replace all prior versions, and listings, of claims in the 
application. 

Listing of Claims: 

1 . (Previously presented): A computer implemented system that facilitates 
building a statistical model for a computer readable data set, comprising: 

a first training algorithm that efficiently builds a rough model from a 
subset of the computer readable data set; 

an evaluation component that determines whether the subset of the 
computer readable data set is an appropriate subset to build a model for the computer 
readable data set; and 

a second training algorithm that builds a refined model for the computer 
readable data set from the subset if deemed appropriate by the evaluation component. 

2. (Previously presented): The system of claim 1, further comprising a data 
scheduler which, based on a data policy, controls the size of subsets for which the first 
training algorithm is applied. 

3. (Currently amended): The system of claim 2, wherein the data scheduler 
increases the size of the subset to provide a larger aggregate subset of the data set if the 
rough model is unacceptable, the first training algorithm efficiently builds the rough 
model for each larger aggregate subset of the data until the evaluation component 
determines the resulting rough model to be acceptable. 

4. (Currently amended): The system of claim 3, wheroin the acceptability of 
each rough model is determined based on a stopping criterion functionally related to an 
expected incremental benefit and a cost associated with increasing the size of the 
aggregate subset of the data set. 
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5 . (Currently amended): The system of claim 4, wher e in the cost of the 
stopping criterion is functionally related to at least one of time associated with evaluating 
an aggregate data subset of increased size and size of the aggregated subset of the data. 

6. (Currently amended): The system of claim 4, wherein the stopping 

criterion is defined by 

( l{D h0 \6{DJ)-l(D H0 \6{p^)) \ 1 = < 

{liDnldiD^-liDnoW^iD*)))^ - ) 1 6D^ | +c 2 (/, -J n ) + c t J n [D n , l \ + c 2 J a + 

where 

/PHo|0(Dn)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

'(Dho|9(Pi>~i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

'(DHo|8ba5e(Dn)) is a log likelihood for holdout data evaluated for a base 

model, 

cu C2, and C3 are constants determined based on application of the second 
training algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second training algorithm, when applied 
to the first subset, 

Ji is the number of iterations for the first training algorithm when applied 
to a data subset Dj, 

I D n +i I is the size of data set D n +i f 

I ADn+i| is the increment in size | D n +i| - 1 Dn|, 

A. is a user determined stopping threshold . 
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7. (Currently amended): The system of claim 4, wher e in the stopping 
criterion is defined by 



where 

/(Dho|0(DiO) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

APho|0(Dh-i)) * s a 1°S likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

/(I>Ho|6base(D t 0) is a log likelihood for holdout data evaluated for a base 

model, 

5 is an offset associated with a difference in log likelihood for holdout 
data when evaluated for models built on a first subset of the training data set by 
the respective first and second training algorithms 9 

Ci, C2, and C3 are constants determined based on application of the second 
training algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second training algorithm, when applied 
to the first subset, 

.7, end 

Ji is the number of iterations for the first training algorithm when applied 
to a data subset D s , 

I D n+l | is the size of data set D rt +i t 

I ADp+i I is the increment in size | D n+ i| - | D n | 7 and 

A, is a user determined stopping threshold- 
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8. (Currently amended): The system of claim 1, wher e in the first training 
algorithm further comprises an iterative algorithm, which builds the rough model for the 
subset of the data set according to an associated training policy. 

9. (Currently amended): The system of claim 8, wh e r e in the first training 
algorithm further comprises an associated training policy that defines parameter 
initialization of the first training algorithm for each subset of the data set. 

1 0. (Currently amended): The system of claim 9, wh e rein the training policy 
associated with the first training algorithm further controls parameter initialization of the 
first training algorithm, such that at least some of the parameters computed for a previous 
subset of the data are employed to initialize the first training algorithm for a subsequent 
larger aggregate subset of the data. 

1 1 . (Currently amended): The system of claim 9, wh e r e in the first training 
algorithm is initialized by the same parameter values for each subset of the data subset. 

12. (Currently amended): The system of claim 9, wh e r e in the training policy 
sets the iterative algorithm to perform a fixed number of at least one iteration. 

1 3. (Currently amended): The system of claim 1 2, wherein the training policy 
sets the iterative algorithm to perform a single iteration. 

14. (Currently amended): The system of claim 12, wh e r e in the second 
training algorithm further comprises an iterative algorithm that operates according to an 
associated training policy, so as to produce a more accurate model for the appropriate 
subset of the data set than the first training algorithm. 

15. (Currently amended): The system of claim 14, wh e r e in the iterative 
algorithm associated with at least one of the first and second training algorithms is an 
Expectation and Maximization algorithm. 
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16. (Currently amended): The system of claim 8 3 wherein the training policy 
associated with the iterative algorithm of the first training algorithm controls the iterative 
algorithm to run until an associated convergence criterion is satisfied. 

17. (Currently amended): The system of claim 16, wher e in second training 
algorithm further comprises an iterative algorithm, which builds the refined model for the 
appropriate subset of the data set according to an associated training policy. 

1 8. (Currently amended): The system of claim 17, whoroin the training policy 
associated with the iterative algorithm of the second training algorithm controls the 
respective iterative algorithm to run until an associated convergence criterion is satisfied, 
wher e in the convergence criterion associated with the second training algorithm provides 
improved model quality relative to the convergence criterion associated with the first 
training algorithm. 

19. (Previously presented): A computer implemented system programmed to 
facilitate building a statistical model, comprising: 

a first parameter estimation algorithm that efficiently builds a rough model 
from a subset of a computer readable data set based on a training policy associated 
therewith; and 

an evaluation component that determines whether the subset of data from 
which the rough model was built is an appropriate size for building the statistical model 
to characterize the data set; 

a second parameter estimation algorithm that builds a refined model for 
the data set from the subset if determined to have the appropriate sue, the second 
parameter estimation algorithm having an associated training policy, which enables the 
second parameter estimation algorithm to build a more accurate model than the first 
parameter estimation algorithm. 



6 

PAGE 6/25 * RCVD AT 7/11/2005 4:26:02 PM [Eastern Daylight Time] * SVR:USPT0€FXRF-1/1 1 DMS:87293Q6 ■ CSID:216 696 8731 1 DURATION (mm*):0M0 



07/11/2005 15:22 FAI 216 696 8731 AMIN, & TUROCY LLP, 



09/873,719 : MS1S8346.01/MSFTP184US 

20. (Previously presented): The system of claim 1 9, further comprising a data 
scheduler that increases the size of the subset of the data set to provide a larger aggregate 
subset of the data set if the rough model is unacceptable, the first parameter estimation 
algorithm efficiently builds a rough model for each larger aggregate subset until a 
resulting rough model built therefrom is determined to be acceptable. 

21 . (Currently amended): The system of claim 19, wh e r e in the first parameter 
estimation algorithm further comprises an iterative algorithm that builds the rough model 
for each subset of the data set according to the associated training policy. 

22. (Currently amended): The system of claim 21 , wk e rekt the training policy 
for the first parameter estimation algorithm is operative to control parameter initialization 
for the first parameter estimation algorithm, such that at least some of the parameters 
computed for a previous subset of the data are employed to initialize the first parameter 
estimation algorithm for a subsequent larger aggregate subset of the data set 

23. (Currently amended): The system of claim 21 , wher e in the first parameter 
estimation algorithm is initialized by the same parameter values for each subset of the 
data subset. 

24. (Currently amended): The system of claim 21 9 wherein the training policy 
associated with first parameter estimation algorithm controls the iterative algorithm of the 
first parameter estimation algorithm to perform a fixed number of at least one iteration, 
the second training algorithm further comprising an iterative algorithm, which is 
operative to perform a greater number of iterations than the iterative algorithm of the first 
training algorithm based on a training policy associated with the second parameter 
estimation algorithm. 
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25. (Currently amended): The system of claim 21 , wberoin the training policy 
associated with the iterative algorithm of the first parameter estimation algorithm controls 
the iterative algorithm to run until an associated convergence threshold is satisfied, 
whoroin the second training algorithm further comprises an iterative algorithm, the 
training policy associated with the iterative algorithm of the second parameter estimation 
algorithm being operative to control the respective iterative algorithm to run until an 
associated convergence threshold is satisfied, the convergence threshold associated with 
the second parameter estimation algorithm is less than the convergence threshold 
associated with the first parameter estimation algorithm. 

26. (Currently amended): The system of claim 1 9, wh e rein the evaluation 
component determines whether the subset of data for which the rough model was built is 
an appropriate size based on a stopping criterion, which is functionally related to an 
expected incremental benefit and an expected incremental cost associated with increasing 
size of the subset of data. 



27. (Currently amended): The system of claim 26, wh e r e in the cost of the 
stopping criterion is functionally related to at least one of time associated with evaluating 
the model for a larger subset of data and size of the larger subset of the data. 



28. (Currently amended): The system of claim 26, wh e r e in the stopping 
criterion is defined by 

( ^|g(PJW(g w |g(P^)) 1 1 



yK&xo |0(Ai))~f(A*o |0&uff(Ai))J c i(A - AD,., 1 4* a (T| -J m )+c t J m \D a « \+c 2 J a +c 3 
where 

/(DHo|9(Dn)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

/(D H o|G(Dn-i)) ^ a l°g likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 
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f(DHo|9base(D n )) is a log likelihood for holdout data evaluated for a base 

model, 

Ci, C2, and c 3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second parameter estimation algorithm, 
when applied to the first subset, 

and 

n w 

Jt is the number of iterations for the first parameter estimation algorithm 
when applied to a data subset Di ( 

| D„+t| is the size of data set Dn+i, 

| AD ir n| is the increment in size | D n +i| - 1 D n |, and 

A, is a user determined stopping threshold. 

29. (Currently amended): The system of claim 26, wher e in the stopping 
criterion is defined by 



where 

/(DhoIOCDt,)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

/(E>Ho|0(D n -i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

/(D H o|0ba5e(Dn)) is a log likelihood for holdout data evaluated for a base 

model, 

5 is an offset associated with a difference in log likelihood for holdout 
data when evaluated for models built on a first subset of the training data set by 
the respective first and second training algorithms, 
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Ci, C2, and c 3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first data subset of the data set, 

Ii is a number of iterations for the second parameter estimation algorithm, 
when applied to a first data subset, 

— 1 " 

J„ = and 

J, is the number of iterations for the first parameter estimation algorithm 
when applied to a data subset D*, 

| Dn+i | is the size of data set D n -n, 

| AD n+ i | is the increment in size | D^i) - I Dn|, and 

X is a user determined stopping threshold, 

30. (Previously presented): A computer implemented learning curve method 
to facilitate building a statistical model, comprising: 

choosing a subset of a computer readable data set; 

employing a first training algorithm to build a rough model to characterize 

the subset; 

evaluating the rough model; 

if the rough model is unacceptable, repeatedly increasing the size of the 
subset of data to provide an aggregate data set, building another rough model to 
characterize the aggregate subset, and reevaluating the model; and 

if the model is acceptable, employing a second training algorithm to build 
a refined model based on the aggregate data set, the second training algorithm being 
different from the first training algorithm. 



3 1 . (Previously presented): The method of claim 30, further comprising 
determining the acceptability of each rough model based on a stopping criterion 
functionally related to an expected incremental benefit and an expected incremental cost 
associated with increasing the size of the aggregate subset of the data set. 
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32. (Currently amended): The system of claim 31, wher e in the cost of the 
stopping criterion is functionally related to at least one of time associated with evaluating 
an aggregate data subset of increased size and size of the aggregate subset of the data. 

33. (Currently amended): The system of claim 31, wh e r e in the stopping 
criterion is defined by 

( KD„o I gffi, ))-tiPuo I gtf>-» ) l = A 

Who W,))-/^!**^ \D^ \+c 2 J m +c 3 < 

where 

^(DHo|9(Pn)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

/(DiiolGpn-O) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

f(DHo|8base(D rt )) is a log likelihood for holdout data evaluated for a base 

model, 

ci 3 c 2 , and C3 are constants determined based on application of the second 
parameter estimation algorithm relative to a first subset of the data set, 

Ii is a number of iterations for the second parameter estimation algorithm, 
when applied to the first subset, 

— 1 n 

and 

J\ is a number of iterations for the first parameter estimation algorithm 
when applied to a data subset Dj, 

| Dth-i I is a size of data set Dn+i, 

| ADn+il is an increment in size | D n +i| - | Dn|, and 

A, is a user determined stopping threshold. 
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34. (Cuirently amended); The system of claim 3 1 , wherein the stopping 
criterion is defined by 

( I(D k0 1 9(0^-1(0^ | QjD^)) } i 

where 

/(DhoI^CDd)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a current subset of the training data set, 

/(DhoIQ(A>-i)) is a log likelihood for holdout data evaluated for the model 
built by the first training algorithm on a previous subset of the training data set, 

^(Dno|&baseCDn)) is a log likelihood for holdout data evaluated for a base 

model, 

5 is an offset associated with the difference in log likelihood for holdout 
data when evaluated for models built on a first subset of the training data set by 
the respective first and second training algorithms, 

ci, C2, and 03 are constants determined based on application of the second 
parameter estimation algorithm relative to a first data subset of the data set, 

1 1 is a number of iterations for the second parameter estimation algorithm, 
when applied to a first data subset, 

- 1 n 

J n and 

Jj is a number of iterations for the first parameter estimation algorithm 
when applied to a data subset Dj, 

I D n+ i I is a si2e of data set D n +i, 

I AT>n+i| is an increment in size | D n +t| - 1 D 0 |, and 

X is a user determined stopping threshold, 

35. (Currently amended): The method of claim 30, wh e rein the first training 
algorithm is more computationally efficient than the second training algorithm. 
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36. (Currently amended): The method of claim 30, wh e r e in each instance of 
model building repeated until obtaining an acceptable model by the first training 
algorithm employs more efficient and less accurate model building than model building 
employed by the second training algorithm that occurs after obtaining the acceptable 
model. 

37. (Currently amended): The method of claim 36, wherein each instance of 
model building repeated until obtaining an acceptable model employs the first training 
algorithm as an iterative algorithm that is run to a first convergence criterion, the second 
training algorithm employing an iterative algorithm that is run to a second convergence 
criterion, which demands more iterations than the first convergence criterion in order to 
obtain convergence, so that the refined model is more accurate than the rough model built 
by the first training algorithm. 

38. (Currently amended): The method of claim 36, whoroin each instance of 
model building repeated until obtaining an acceptable model employs an iterative 
algorithm having a fixed number of at least one iteration, the second training algorithm 
employing an iterative algorithm having a greater number of iterations than the fixed 
number. 

39. (Original): The method of claim 30, further comprising controlling 
parameter initialization employed in each instance of building a model for the aggregate 
data set prior to obtaining an acceptable model. 

40. (Original): The method of claim 39, further comprising initializing the 
first training algorithm by the same parameter values for each subset. 

41. (Currently amended): The method of claim 39, wherein the controlling 
further comprises reusing at least some of the parameters computed from a previous 
instance of model building to initialize a subsequent instance of model building for a 
subsequent larger aggregate data set prior to obtaining an acceptable model. 
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42. (Previously presented): A computer-readable medium having computer- 
executable instructions for: 

choosing a subset of a computer readable data set; 
building a rough model to characterize the subset based on an associated 
training policy; 

evaluating the rough model; 

if the rough model is unacceptable, repeatedly increasing the size of the 
subset of data to provide an aggregate data set, building a rough model to characterize the 
aggregate subset based on an associated training policy, and reevaluating the rough 
model; and 

building a refined model for the computer readable data set from the 
aggregate data set if the rough model is determined to be acceptable based on an 
associated training policy. 

43. (Original): The method of claim 42, further comprising determining tbe 
acceptability of the model based on an expected incremental benefit relative to an 
expected incremental cost associated with increasing the size of the aggregate data set. 

44. (Previously presented): A computer implemented method to facilitate 
constructing a statistical model, comprising: 

separating computer readable data into holdout data and training data; 

determining a data subset from the training data by estimating model 
parameters according to a first training policy and evaluating the estimated model 
parameters relative to the holdout data set and repeating the estimation and evaluation of 
model parameters with a larger subset of the training data until an acceptable quality of 
the estimated model is established; and, 

subsequent to establishing the acceptable quality of the estimated model, 
using the determined data subset to improve the estimated model parameters by 
employing a second training policy that is more accurate than the first training policy. 
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45. (Currently amended): The method of claim 44, wh e r e in each estimation 
of model parameters repeated until the acceptable quality of the estimated model is 
established further comprises employing an iterative algorithm that is run until a first 
convergence criterion is satisfied, the estimation of model parameters using the 
determined data subset further comprising an iterative algorithm that is run until a second 
convergence criterion is satisfied, which is operative to provide a better quality of model 
than the first convergence criterion. 

46. (Currently amended): The system of claim 45 3 wh e r e in the first 
convergence criterion causes the associated iterative algorithm to run until a first 
convergence threshold is satisfied, wh e r e in the second convergence criterion causes the 
associated iterative algorithm to run until a second convergence threshold is satisfied, the 
second convergence threshold being less than the first convergence threshold. 

47. (Currently amended): The method of claim 45, wh e r e in at least one of the 
iterative algorithm run to the first convergence criterion and the iterative algorithm run to 
the second convergence criterion is an Expectation and Maximization algorithm. 

48. .(Currently amended): The method of claim 44, 'whor - oia each estimation 
of model parameters repeated until the acceptable quality of the estimated model is 
established employs an iterative algorithm having a fixed number of at least one iteration, 
the estimation of model parameters using the determined data subset further employing 
an iterative algorithm having a greater number of iterations than the fixed number. 

49. (Original): The method of claim 44, further comprising controlling 
parameter initialization employed in each estimation of model parameters repeated until 
determining an acceptable size for the determined data subset. 
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50. (Currently amended): The method of claim 44, wher e in the controlling 
further comprises reusing at least some of the parameters computed from a previous 
estimation of model parameters to initialize a subsequent estimation of model parameters 
for a next larger subset of the training set. 

5 1 . (Currently amended): The method of claim 44, wherein each estimation 
of model parameters repeated until the acceptable quality of the estimated model is 
established further comprises initializing the first training algorithm by the same 
parameter values, 

52. (Original): The method of claim 44, further comprising determining the 
acceptability of the estimated model based on an expected incremental benefit relative to 
a cost associated with increasing the size of the subset of the data set. 

53. (Previously presented): A computer-readable medium having computer- 
executable instructions for: 

separating computer readable data into holdout data and training data; 

determining a data subset from the training data by estimating model 
parameters according to a first training policy and evaluating the estimated model 
parameters relative to the holdout data set and repeating the estimation and evaluation of 
model parameters with a next successively larger subset of the training data set until an 
acceptable quality of the estimated model is established; and 

subsequent to establishing the acceptable quality of the estimated model, 
using the determined data subset to improve the estimated model parameters by 
employing a second training policy that is more accurate than the first training policy. 
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54. (Previously presented): A computer implemented method to facilitate 
constructing a statistical model, comprising: 

separating computer readable data into a holdout data set and a training 

data set; 

iteratively estimating model parameters for a subset of the training data set 
over a fixed number of iterations and evaluating the estimated model parameters relative 
to the holdout data set; 

repeating the estimation and evaluation of model parameters obtained with 
successively larger subsets of the training data set until an acceptable model quality is 
established; and 

after the acceptable model quality is established, iteratively estimating 
model parameters for the data subset, which provided the acceptable model quality, until 
a better quality of model is provided relative to a preceding estimation performed over 
the fixed number of iterations. 

55. (Currently amended): The method of claim 54, wh e r e in at least one of the 
iterative estimations employs an Expectation and Maximization algorithm. 

56. (Currently amended): The method of claim 54, wher e in the estimation 
that occurs after the acceptable model quality is established, further comprises employing 
an iterative algorithm having a greater number of iterations than the fixed number. 

57. (Currently amended): The method of claim 54, whcr-cm the estimation of 
model parameters after the acceptable model quality has been established further 
comprises employing an iterative algorithm that is run until a convergence criterion is 
satisfied, which is operative to provide abetter quality of model with the data subset than 
a preceding estimation employing the fixed number of iterations. 

58. (Original): The method of claim 54, further comprising controlling 
parameter initialization for each estimation of model parameters that occurs before the 
acceptable model quality has been established. 
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59. (Currently amended): The method of claim 5 8, wher e in each iterative 
estimation until the acceptable model quality is established further comprises initializing 
the first training algorithm by the same parameter values. 

60. (Cwrently amended): The method of claim 58, wherein the controlling 
further comprises reusing at least some of the parameters obtained in a previous 
estimation of model parameters to initialize a subsequent estimation of model parameters 
for a next larger subset of the training data set. 

61. (Original): The method of claim 54, further comprising determining the 
acceptability of the model based on an expected incremental benefit relative to an 
expected incremental cost associated with an increase in size of each larger training 
subset of the data set 

c 

62. (Previously presented): A computer implemented method to facilitate 
constructing a statistical model, comprising: 

separating computer readable data into a holdout data set and a training 

data set; 

iteratively estimating model parameters for a subset of the training data set 
until a first convergence threshold is satisfied and evaluating the estimated model 
parameters relative to the holdout data set; 

repeating the estimation and evaluation of model parameters obtained with 
successively larger subsets of the training data set until determining a size of data subset 
that provides acceptable model parameters; and 

after determining the size of data subset that provides acceptable model 
parameters, iteratively estimating model parameters for a data subset of the acceptable 
size until a second convergence threshold is satisfied, the second convergence threshold 
being less than the first convergence threshold. 
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63. (Previously presented): A computer implemented system to facilitate 
building a statistical model for a computer readable data set, comprising: 

first means for building a rough model to characterize a subset of the 
computer readable data set; 

evaluation means for evaluating the acceptability of the rough model, the 
first means building another rough model for a larger subset of the data if the evaluation 
means determines that a prior rough model is unacceptable; and 

second means, which is different from the first means, for building a 
refined model from an aggregate subset of data that yielded the rough model deemed 
acceptable by the evaluation means. 

64. (Previously presented): A computer implemented system to facilitate 
building a statistical model for a computer readable data set, comprising: 

first means for estimating model parameters from a subset of the computer 
readable data set; 

means for evaluating the estimated model parameters relative to a holdout 
set of the data set; 

means for determining a data subset from the training data by causing the 
first means and the means for evaluating to respectively repeat estimation and evaluation 
of model parameters with a next successively larger subset of the training data set until an 
acceptable quality of the model parameters is established; and 

second means fox estimating model parameters based on the determined 
data subset to provide a more accurate estimation of model parameters than the first 
means. 
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